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This paper is concerned with the utilization of speech waveform 
periodicities in differential pulse code modulation (DPCM) coding with 
2-bit adaptive quantization and time-invariant spectrum prediction. 
Our work is based on computer simulations of DPCM codes. We have 
studied pitch detectors based on autocorrelation and an average 
magnitude difference function (AMDF), and we have measured the 
benefits of predicting from a previous pitch period as functions of 
pitch- period-updating frequency and periodicity-indicating thresholds 
(for autocorrelation and the AMDF). We have compared several alter- 
native methods of utilizing past quantized samples (in the present and 
previous pitch periods) for providing speech sample predictions. We 
find the following combination to be attractive for waveform coding at 
bit rates in the neighborhood of 16 kb/s: 2-bit adaptive quantization 
with a one-word (2-bit DPCM word) memory, pitch detection performed 
on unquantized speech (preferably with an AMDF criterion) and a 
prediction scheme that uses fixed three-tap (short-term) prediction 
for nonperiodic waveform segments, but switches to an appropriate 
one-tap (long-term) predictor upon the detection of strong periodicity. 
With four sample utterances, the latter procedure results in an average 
SNR (signal-to-noise ratio) gain of 3.75 dB over a non- pitch-adaptive 
encoder. 

I. INTRODUCTION 

An important subclass of speech waveform encoders is characterized 
by the use of adaptive quantization and predictive (DPCM) encoding. 1 
Time-invariant spectrum predictors are simple to implement and robust 
in the context of coarse quantization. The benefits of adaptive prediction 
are, however, well recognized and documented, 23 and the greatest 
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achievements in bit-rate reduction have in fact depended on the use of 
adaptive short-term (spectrum) prediction as well as adaptive long-term 
(pitch) prediction, as seen in the paper by Atal and Schroeder. 4 

This paper is concerned with the relatively less documented combi- 
nation of adaptive pitch prediction and nonadaptiue spectrum pre- 
diction. The study of this kind of prediction is motivated by the obser- 
vation that speech waveforms abound in highly periodic segments and 
by the conjecture that the use of this periodicity may provide a prediction 
potential that is substantial enough to obviate the need for adaptive 
short-term (spectrum) prediction. The attraction in this approach will 
evidently depend on the complexity of pitch detection itself. The pitch 
detectors used in this paper are based on autocorrelation and AMDF 
(average magnitude difference function) and are quite simple to im- 
plement; they are indeed much simpler than the mean-squared-error- 
minimizing pitch detector described in Ref. 4. Moreover, as discussed 
in Section IV, the success of pitch-adaptive DPCM does not depend 
critically on accurate pitch detection in the sense in which the term is 
used in formal speech research. 5 

A thesis by Trottier 6 considers the possibility of simplifying the 
Atal-Schroeder encoder. 4 Among other things, this thesis discusses 
simple pitch-detection algorithms, the criticality of a well-designed 
adaptive quantizer, and the inefficiency of approaches seeking to simplify 
adaptive spectrum prediction through the use of very few predictor taps, 
say two. An unpublished work of Grizmala 7 provides one of the first 
proposals for a simple pitch-adaptive DPCM that entirely avoids adaptive 
spectrum prediction. Grizmala discusses AMDF-based pitch detection 
and fixed three-tap spectrum prediction for nonperiodic waveform 
segments. More recently, Xydeas and Steele report an instance of a 6-dB 
SNR gain for a fixed-spectrum DPCM encoder arising from the utilization 
of waveform periodicities. 8 Finally the detection of periodicity based 
on autocorrelation and AMDF is documented in speech papers 5,9,10 as 
well as in coding literature. 11 

One of the contributions of the present paper is the demonstration 
that fixed -spectrum pitch-adaptive DPCM is useful in the context of a 
specific type of adaptive quantizer that has received considerable at- 
tention in recent coding work. 12,13 This paper also shows that AMDF- 
based pitch detection is slightly more effective than an autocorrela- 
tion-based procedure. The paper also demonstrates that, during periodic 
waveform segments, a simple one-tap predictor across the pitch period 
is more efficient than several multitap predictors involving many past 
samples in the present and previous pitch periods. Finally, the paper 
includes formal measurements of pitch prediction gain as a function of 
(i) pitch-period-update frequency, and of (ii) thresholds that the AMDF 
and correlation functions should exceed for a waveform segment to be 
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judged as periodic. Our results are all based on computer simulations 
of DPCM encoders. 

The results of this paper are expected to be relevant to speech wave- 
form coding at bit rates in the order of 16 kb/s. At this bit rate, the use 
of fixed spectrum prediction and adaptive quantization results typically 
in a quantization noise level that is quite easily perceived, while the so- 
phistication of adaptive spectrum prediction is often unwarranted, be- 
cause undesirable quantizer-predictor interactions begin showing up 
at around 16 kb/s in practical waveform coder designs. 1415 Adaptive 
pitch prediction, on the other hand, appears to be a useful and robust 
sophistication at 16 kb/s. With this bit rate in mind, this paper will deal 
exclusively with two-bit quantizers for the DPCM coding of Nyquist- 
sampled (8-kHz) telephone-quality (200-3200 Hz) speech. Our numerical 
results refer to two female utterances, "The chairman cast three votes" 
and "The boy was mute about his task," and two male utterances "A 
lathe is a big tool," and "The boy was mute about his task." These ut- 
terances will henceforth be labeled Fl, F2, Ml, and M2. 

The organization of the paper is as follows. Section II recommends 
a slowly adaptive quantizer with a one-word memory, and Section III 
proposes a three-tap spectrum predictor. Section IV discusses pitch 
detection by means of AMDF- and autocorrelation-type procedures, and 
points out how pitch analysis can be performed either on quantized 
speech or on the original unquantized speech. Section V compares dif- 
ferent prediction algorithms for periodic segments, including the im- 
portant example of an appropriate one-tap predictor. Section VI mea- 
sures the gains of pitch-adaptive DPCM as a function of (i) the pitch- 
detection procedure, (ii) AMDF and autocorrelation thresholds used in 
hypothesizing periodicity, (Hi) pitch-period-updating time, and (iv) 
prediction algorithms used for periodic waveform segments. Section VII 
summarizes performance figures for the four sample sentences and 
discusses results in the context of 16-kb/s waveform-coding. 

II. TWO-BIT ADAPTIVE QUANTIZER 

Figure 1 shows a uniform four-level quantizer used for pitch-adaptive 
DPCM coding. The step-size A is adaptive. The adaptations are based 
on a one- word memory. 12 ' 13 Specifically, the step-size is modified at every 
sampling instant by a multiplier that depends only on whether the 
magnitude of the previous quantizer output was 0.5A r or 1.5 A r . Re- 
spective step-size multipliers make A r+ i = Ei-A r or E 2 -\. In the context 
of quantizing prediction errors across apitch period, we have found that 
the most useful adaptations were 'slow' adaptations of the form: 12 

E x = 0.95; E 2 = 1.10.' (1) 

As discussed at length in Ref. 12, values of optimal step-size multipliers 
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Fig. 1 — A 2-bit adaptive quantizer. 



reflect the nature of the input signal spectrum, and the stationarity of 
the input variance. The step-size adaptations were subject to maximum 
and minimum values that were appropriate for the given peak speech 
amplitude of ±1024: 

A M ax = 192, A M in = 1-5. (2) 

Finally, nonuniform quantizers were not found to be very effective 
in pitch-adaptive DPCM using adaptive quantization. This had to do with 
the effect of DPCM predictions on the probability density function (PDF) 
at the quantizer input. The observation that nonuniform quantization 
is not very beneficial reflects the fact that predictions in DPCM cause a 
quantizer-input PDF that is more gaussian than the PDF of the original 
speech amplitudes. The latter, for example, can be modelled by a 
gamma-PDF for which nonuniform quantization is very useful. 2,3 

III. TIME-INVARIANT SPECTRUM PREDICTION 

A T-tap spectrum predictor is represented by 



X r = Za s 'XQ r -s 



(3) 



s=l 



where X and XQ refer to input and quantized speech samples. 

In time-invariant (fixed) prediction, the coefficients a are matched 
to the long-term spectrum of speech via the corresponding autocorre- 
lation function, as described in Ref. 1. 



442 THE BELL SYSTEM TECHNICAL JOURNAL, MARCH 1977 



Using a typical long-term spectrum characterization, 7 the following 
designs have been used for fixed one-tap and three-tap spectrum pre- 
dictors: 

ai = 0.85 forT=l (4) 

and 

d = 1.10; a 2 = -0.28; a 3 = -0.08 for T = 3. (5) 

These predictor coefficients are rounded values resulting from a spec- 
trum model where the speech autocorrelations are 0.825, 0.562, and 0.308 
for delays of one, two, and three 8-kHz samples, respectively. These 
autocorrelations are reported in Ref. 16 as the result of a study on a very 
large speech-sample base, and constitute slight revisions of very similar 
autocorrelations reported in Ref. 17. 

In coding our speech waveforms, the three-tap predictor provided a 
typical SNR gain of nearly 1 dB over the one-tap predictor. Spectrum 
predictions in this paper will henceforth refer to a time-invariant 
three-tap design, as in eq. (5). 

IV. MEASUREMENT OF PITCH PERIOD 

This section defines the AMDF- and autocorrelation-based pitch 
measurements used in our work, discusses the use of unquantized speech 
samples X or quantized samples XQ for the pitch analysis, and provides 
illustrations of pitch measurements. In general, pitch analysis will be 
based on a window W containing W contiguous speech samples Z{Z = 
X or XQ). The sampling instant when a pitch period is measured is de- 
noted by r, so that a current speech sample will be Z r (X r or XQ r , as 
appropriate). The pitch period is denoted by P, and P is assumed to have 
minimum and maximum values Pmin and Pmax. respectively. G i and 
G 2 are thresholds that can be used to hypothesize waveform periodicity 
with varying degrees of confidence. V is the pitch period updating time 
(see Section VI). 

4.1 AMDF-based pitch measurement 
Consider the average magnitude difference function 

AMDF(p) = AVERAGE | Z u - Z u - P \\ 

p = Pmin^min + 1, • • • ,Pmax, (6) 

where the averaging is over all pairs (u,u — p) such that both Z u and Z u - P 
are in W. 

The AMDF pitch detector estimates the pitch period P to be 
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P - Pest 

if amdf(pest) < amdf(p) (7) 

for all p in the range (Pmin, Pmax) with the exception of Pest, and if 

amdf(pest) < G i • average( I Z u \ ), for (8) 

all U in W. 

The value of Gi is discussed in detail in Section VI. Typically, G\ = 
0.5. With Nyquist-sampled (8-kHz) speech and for a single pitch-analysis 
procedure that should cover the expected range of p in both male and 
female speech, the following numbers seem appropriate: 5 

Pmin = 16, Pmax = 160, W = 256. (9) 

Notice that Pmin excludes the obvious minimum AMDF (0) at p = 0, and 
that the window length W is well in excess of the maximum anticipated 
pitch period Pmax- It turns out that this requirement (W > Pmax) is 
quite important for efficient pitch prediction and waveform coding. The 
range of the pitch-period search (16 < p < 160) is wide enough to cause 
frequent problems with multiple peaks in the AMDF function, and 
multiples of the fundamental pitch period are often picked up as P. 
Fortunately, however, this kind of error in pitch tracking appears to be 
quite harmless as far as pitch-adaptive waveform codes are concerned: 
the need is for a sequence of waveform samples \XQ\ that provide good 
predictions of a current sequence \X] in periodic segments, and it seems 
to be immaterial whether \X\ and \XQ\ are one pitch period apart or n 
(>1) pitch periods apart. 

4.2 Autocorrelation-based pitch measurement 

Consider the autocorrelation function 

C(p) = AVERAGE(sgn Z u • sgn Z u - P ); 

P = Pmin»Pmin + 1, • • • .Pmax, (10) 

where the averaging is over all pairs (u,u—p) such that both Z u and Z u - P 
are in W and, furthermore, both \Z U \ and \Z U - P \ exceed an appropriate 
speech-clipping level 

Zclip = 0.64 MAX(|Z|^ AX , |Z|jUx), (ID 

where |Z|max * s * ne maximum speech magnitude in the first one-third 
part of W and |Z|max is the maximum speech magnitude in the third 
one-third part of W. 

The autocorrelation-pitch detector estimates the pitch period P to 
be 

P = Pest (12) 
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if 

C(pest) > C(p) 

for all p in the range of (Pmin^max) with the exception of pest, and 
if 

C(pest) > G 2 . (13) 

The role of G 2 is discussed at length in Section VI. Typically G 2 = 0.2. 
Appropriate values of Pmin, Pmax, and W follow (9). The nonzero value 
of Pmin excludes the obvious maximum C(0) atp =0. 

The center-clipping operation described by (11) is quite effective in 
mitigating spurious peaks in the C(p) function, such as peaks repre- 
senting a low first-formant frequency. Typically, autocorrelation pitch 
detectors work with speech that is low-pass filtered to, say, 900 Hz, 5 but 
such filtering was not used in our waveform coding program. 

The pitch-measurement techniques based on (6) and (10) — especially 
the autocorrelation method (10) — are easier to implement than the 
mean-squared-error-minimizing pitch detector described in Ref. 4, which 
is based on computing the autocorrelation of Z [this involves computing 
products of real numbers, instead of taking differences as in (6) or using 
one-bit numbers as in (10)]. The efficacies of AMDF- and autocorrela- 
tion-based pitch detectors have recently been calibrated in terms of the 
performance of several other pitch-tracking procedures. 5 

4.3 Pitch analyses based on X and XQ 

Figure 2a demonstrates pitch analysis based on original, unquantized 
speech samples X. We see how the analysis window can be aligned so as 
to extend equally on either side of the current sample X r to be encoded 

h*- a^ h 

X,*l-W/2 X ' Xr+W/2 

(a) 






2ZZZZZZZ2. tzZBZZZZL 



XQ,_w XQ,-i X, 

(b) 
Fig. 2— Pitch analysis based on (a) unquantized speech X and (b) quantized speech 
XQ. 
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Table I — Local and global minima/maxima in pitch-period search 

(Gi - 0.84, G 2 = 0.2; speech sample: M2, analysis based on 

unquantized speech) 





Minimization of 


Maximization 


p* 


Normalized AMDF 


of Autocorrelation 


29 


_ 


0.33 


30 


0.66 


0.34 


31 


— 


0.35 


34 


0.62 


— 


37 


— 


0.38 


38 


— 


0.39 


95 


— 


0.45 


96 


0.40 


0.46 



* Pitch-period estimate = 96 samples 

(quantized); such alignment turns out to be quite critical for realizing 
the maximum potential of pitch-adaptive waveform codes. 

Figure 2b shows the analysis of pitch based purely on past quantized 
samples XQ r - s {s > 0). Figures 2a and 2b apply equally to AMDF or 
autocorrelation analysis. 

4.4 Illustrative measurements of pitch 

Table I demonstrates examples of AMDF- and autocorrelation -based 
searches for the pitch period P. Entries in the table represent those local 
minima/maxima in the AMDF/C functions, which were below/above all 
previous local minima/maxima in the search for P (16 < p < 160). Also, 
only those minima/maxima that cross the G1/G2 thresholds, eqs. (8) and 
(13), are listed. For both the AMDF and C functions, a global peak ap- 
pears at the pitch period P = 96. 

Table II provides a typical time plot of P (number of 8-kHz samples) 
for four different pitch-tracking techniques. The analysis refers to a 
sample segment from the utterance PI. Notice the remarkable closeness 
of X-based contours in columns 1 and 3. Notice also that with XQ- based 
analyses, the AMDF function tends to preserve pitch information much 
better than the autocorrelation measurement. 

V. PREDICTION ALGORITHMS FOR PERIODIC WAVEFORMS 

Figure 3 sketches a periodic waveform segment. P is the 'pitch period', 
X r is a current waveform sample to be encoded, and XQ denotes an al- 
ready quantized sample in the present 'pitch period' or in an earlier 'very 
similar segment' of the periodic waveform. 

Our prediction algorithms for periodic waveforms are linear, and they 
are of the general form 

o 3 3 

X r = £ a u • XQ r - u + £ a P+v • XQ r - P - v . (14) 

u=l o=0 
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Table II — Pitch-period contours from four pitch-tracking 

techniques (speech sample: F1). Entries along columns are 

successive values of P (number of 8-kHz samples) 







Autocorrelation 


Autocorrelation 


AMDF of X 


AMDF of XQ 


ofX 


ofXQ 


2 


2 


2 


19 


39 


39 


2 


19 


78 


39 


78 


19 


39 


39 


39 


19 


39 


39 


39 


39 


39 


39 


39 


38 


39 


39 


39 


39 


39 


2 


39 


41 


43 


2 


43 


44 


40 


2 


40 


35 


41 


42 


41 


25 


132 


132 


132 


2 


134 


134 


134 


2 


135 


134 


135 


2 


57 


135 


57 


2 


7H 


80 


78 


48 


157 


157 


157 


50 


35 


35 


2 


19 


2 


2 


2 


19 


2 


2 


2 


19 


2 


2 


2 


2 


2 


2 


2 


18 


2 


2 


2 


2 


2 


2 


2 


18 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


31 


35 


2 


35 


34 


35 


2 


35 


34 


35 


35 


35 


35 


36 


35 


36 


36 


36 


35 


36 


36 


36 


36 


36 


36 


37 


36 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


37 


75 


37 


75 


37 


7. r > 


75 


37 


37 


37 


75 


37 


37 



We have considered many special cases of the general algorithm (14); 
Table III summarizes three interesting examples. 

The seven-tap predictor attempts a clever combination of spectrum 
prediction [see (5) in Section III] and pitch prediction. This approach 
was proposed by Grizmala, 7 who in turn was simplifying a formal pro- 
cedure of Atal and Schroeder. 4 The three-tap predictor is the simplest 
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Fig. 3 — Prediction algorithms for periodic waveforms. 

nontrivial combination of the two types of prediction. It is suggested by 
a simple geometrical procedure of completing an idealized parallelogram 
with vertices at the topmost four dots in Fig. 3. Finally, the one-tap 
predictor is the simplest approach to pitch-adaptive coding and is sug- 
gested by the very strong correlations that are observed between X r and 
X r -p in highly periodic waveform segments. 

VI. DESIGN AND PERFORMANCE OF PITCH-ADAPTIVE DPCM CODER 

Figure 4 provides a block diagram of the pitch-adaptive DPCM coder. 
It is different from conventional DPCM 1 in the inclusion of a special 
predictor for encoding the periodic segments of the input waveform. The 
spectrum predictor is formally defined by (5) and the pitch predictor 
by (14). The switching between the two predictors is controlled by the 
crossings of appropriate thresholds G\ and Gi (Section IV) by the AMDF 
or autocorrelation functions, respectively. The test for periodicity is done 
once every V samples. If the waveform is decided to be "periodic" as a 
result of the test, the pitch period P (coming out of the AMDF or auto- 
correlation measurement) is used in the predictive encoding of a current 
block of V samples. (Both the binary "periodic/nonperiodic" decision 
and the pitch period, if any, are updated for the next block of V sam- 
ples.) 

6. 1 SNR, SNRV, and SNRSEG 

The design and utility of pitch -adaptive coders will be discussed using 
the following signal-to-noise ratio as a performance criterion 

SNR(dB) = 10 log 10 [" £ X 2 r 1 1 (X r - XQ r )A, (15) 
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Table III — 


Three prediction 


algorithms 


for periodic waveforms 


Name of 
Predictor 


ai 


09 


03 


a P 


ap+i ap+> 


ap+;\ 


AVERAGER 
"7-Tap" 
"3-Tap" 
"1-Tap" 


0.5 
1.1 
1 





-0.28 








-0.08 






0.5 
1 
1 
1 



-1.1 0.28 
-1 






0.08 





where N is the total number of input samples. 

In deference to the fact that the pitch-adaptive coding is performed 
in blocks of V samples, we consider an additional measure of perfor- 
mance for the Sth block 

SNRV(SKdB) = 

101og 10 f E *?/ E (X r -XQ r )*\ (16) 

Lr=V(S-l)+l / r=V(S-l)+l J 

The average value of SNRV over the total input signal duration (over 
N/V input blocks) will be called the 'segment-sigrial-to-noise ratio' 
SNRSEG (Ref. 18) 

1 N/V 

(17) 



, N/V 

SNRSEG = — — £ SNRV(S). 
N/Vs-i 



SNRV is an obvious indicator of local encoding quality; its average 
value SNRSEG reflects aspects of quantizer performance that do not 
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Fig. 4 — Block diagram of pitch-adaptive coder. 
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Table IV — Comparison of prediction algorithms (utterance: F1; 

number of blocks: 134; block length V: 64; pitch-detector: based 

on unquantized speech and amdf; G^ = 0.71) 



Predictor 


Averager 


7-Tap 


3-Tap 


1-Tap 


SNR(dB) 
SNRSEG(dB) 


10.3 

15.0 


13.1 
16.5 


13.3 
16.8 


14.4 
16.8 



always come out from the conventional SNR measure. 18 For example, 
the time variation of SNRV would provide an appropriate indication 
of the differential treatment of voiced and unvoiced waveform segments 
(this is seen in Fig. 5); also, occasional large samples of SNRV (associated 
with pitch-adaptive coding of highly periodic segments) would have a 
better chance of showing up in the final result if the performance mea- 
sure is SNRSEG, rather than the conventional SNR. 

6.2 Comparison of the prediction algorithms of Table III 

Table IV compares the performances of the four predictors in Table 
III for the DPCM encoding of a typical position of utterance Fl. It is very 
interesting that the simplest of these predictors, the one-tap predictor, 
provides the best encoding. In fact, the rest of this paper will uniformly 
assume an appropriate one-tap predictor for periodic segments. 

6.3 Choice of decision thresholds G 1 and G 2 

Table V illustrates AMDF-based coding as a function of the periodic- 
ity-decision threshold G\ [see (8)]. A choice of G\ = 0.84 appears to 
provide the best combination of SNR and SNRSEG. This value of Gi 
corresponds to a 1.5-dB prediction gain [ratio of average magnitude of 
input X to average magnitude of prediction error e (see Fig. 4)]. The 
value of G\ = 0.71 (corresponding to a 3-dB prediction gain) provides 
a performance that is very close to the maximum. In fact, Grizmala 7 
recommends the latter value of G\ = 0.71. 

Table VI shows corresponding results for autocorrelation-based DPCM 
with G 2 as parameter. One notes a broad optimum, with Gi — 0.2 rep- 
Table V — Effect of Gi on AMDF-based pitch-adaptive dpcm (all 

parameters are the same as for Table I except that G^ is now a 

variable) 



G, 





0.50 


0.71 


0.84 


1.0 


SNR(dB) 
SNRSEG(dB) 


9.3 
12.5 


14.2 
15.2 


14.2 
16.6 


14.4 

16.8 


14.5 
14.9 
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Table VI — Effect of G 2 on autocorrelation-based pitch-adaptive 

dpcm (all parameters are the same as for Table I except that the 

pitch detection is now correlation-based) 



G 2 


0.1 


0.2 


0.3 


0.4 


0.6 


SNR(dB) 
SNRSEG(dB) 


13.6 
14.6 


13.8 
14.5 


13.6 
14.3 


13.3 
15.8 


10.3 
14.3 



resenting a reasonable autocorrelation threshold for hypothesizing pe- 
riodicity; it is interesting that an SNRSEG criterion would dictate G 2 
= 0.4. 

6.4 Comparison of pitch detectors: amdf vs autocorrelation; X-analysIs vs 
XQ-analysis 

Table VII compares, for optimal settings of Gi and G 2 , the encoding 
performances of AMDF- and autocorrelation-based pitch measurements. 
Notice the slight superiority of the AMDF approach, especially from an 
SNRSEG point of view. Notice also that pitch analyses based on X (Fig. 
2a) are distinctly superior to those based on quantized speech XQ (Fig. 
2b). Finally, it is very significant that, in the case of XQ -based analyses, 
the value of SNRSEG is 3- to 5-dB higher than that of SNR. This indi- 
cates that even with XQ -based designs, many periodic segments get 
encoded very well in a short-term sense (leading frequently to very good 
SNRV values that tend to boost the average SNRV- value SNRSEG). 
The above observation has been confirmed in informal listening tests. 
These tests have also shown that the quantization noise in XQ- based 
AMDF-coding tends to be "whiter" than the noise obtaining with the 
other three pitch-detection schemes of Table VII. 

6.5 Pitch-period update-time V 

Table VIII shows coder performance as a function of how frequently 
the periodicity test is made, and a possible pitch period recomputed. 

Table VII — Comparison of four pitch detectors (all parameters 

are the same as for Table I, except that four pitch detectors are 

involved, and G^ and G 2 are optimized for each case) 



Type of Pitch Analysis 


AMDF 


Autocorre- 
lation 


Basis of the analysis 

SNR-optimizing G-values {G\ for AMDF, G 2 for 

correlation) 

SNR(dB) 

SNRSEG(dB) 


X XQ 
0.84 0.84 

14.4 10.0 
16.8 15.0 


X XQ 
0.20 0.30 

13.8 10.1 
14.5 13.2 
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Table VIII — Dependence of performance on update time V; 
entries are SNR values in dB (female utterance: Ft; number of 
blocks: 134; male utterance: Ml; number of blocks: 134; pitch 
detector: based on unquantized speech and amdf; G^ = 0.71) 



V 


32 


64 


128 


192 


Male 
Female 


15.1 


12.1 
14.4 


11.4 

12.8 


9.8 



Recall that the update time assumed in Tables IV through VII was V 
= 64 samples (8 ms). Previous researchers 4-7 have usually recommended 
V-values like 40 or 50. 

VII. SUMMARY AND CONCLUSIONS 

Table IX compares, for the complete utterances Fl, F2, Ml, and M2, 
the performance of pitch-adaptive DPCM coding with that of DPCM with 
a fixed three-tap spectrum predictor. Note that both of these coders use 
adaptive quantization. The conventional encoder uses a fixed spectrum 
predictor while the pitch-adaptive encoder includes a second adaptive 
one-tap predictor, which is switched in whenever an AMDF analysis on 
X suggests sufficient periodicity (G\ = 0.84). 

We note that there exists across the four sample sentences an average 
3.8-dB SNR gain with pitch-adaptive coding. The better performance 
with female speech is not surprising, since for a given duration of a voiced 
speech utterance, the high-pitched female utterances have a greater 
number of pitch periods. 

Figure 5 provides a typical time-plot of pitch period P and local sig- 
nal-to-noise-ratio SNRV in pitch-adaptive coding. The example refers 
to a segment from F2. A pitch-period of zero in Fig. 5 indicates absence 
of periodicity. Notice the low values of SNRV for these nonperiodic 
blocks. Also, notice the cluster of three values of P ^ 133. These three 
estimates are obviously three times a true pitch period =*44. 

As mentioned earlier, the work in this paper was motivated by the 
desire to improve waveform encoder performance at bit rates in the order 

Table IX — Summary of dpcm encoder performance 





Median Pitch 


Number of 








(Number of 


Speech 


DPCM With no 


Pitch-Adaptive 


Sample 


8-kHz 


Blocks 


Pitch Tracking 


DPCM 


Utterance 


Samples) 


(V = 64) 


SNR(dB) 


SNR(dB) 


Fl 


36 


240 


10.0 


15.0 


F2 


40 


288 


14.0 


18.0 


Ml 


90 


192 


11.0 


13.5 


M2 


92 


245 


11.0 


14.5 



452 THE BELL SYSTEM TECHNICAL JOURNAL, MARCH 1977 




- 140 



130 



50 i 



10 



BLOCK NUMBER 
(BLOCK LENGTH - 64 SAMPLES) 

Fig. 5 — Typical time variations of pitch period and local signal-to-noise ratio SNRV. 
(Data refers to a segment from utterance F2) 

of 16 kb/s. The 2-bit pitch-adaptive coders discussed need 16 kb/s to 
transmit prediction-error information; and if pitch-analysis is to be 
performed on uncoded speech, the transmission of this information to 
a receiver will entail an additional channel capacity of about 1 kb/s. This 
assumes that pitch-period samples are coded with 7-bit accuracy and 
updated (and transmitted once, say, every 56 samples (8 kHz X 7 bits/56 
= 1 kb/s). Alternatively, the coder can be used on a 16-kb/s channel if 
the sampling rate can be restricted to 15 kb/s/2 bits = 7.5 kHz. 
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